High-dimensional Similarity Joins

نویسندگان

  • A. D. Narasimhalu
  • S. Christodoulakis
چکیده

A. Toga. QBISM: a prototype 3-d medical image database system. B. Seeger. The R-tree: an efficient and robust access method for points and rectangles. [7] C. Faloutsos and K.-I. Lin. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. [8] A. Guttman. R-trees: a dynamic index structure for spatial searching. [12] D. Lomet and B. Salzberg. The hB-tree: A multi-attribute indexing method with good guaranteed performance. [17] J. T. Robinson. The k-D-B-tree: A search structure for large multidimensional dynamic indexes. In Proc. tree: a dynamic index for multi-dimensional objects. A fast index structure for high-dimensional similarity joins. [22] D. Vassiliadis. The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables. In Int'l Workshop on Applications of first finds similar " atomic " subsequences, and then stitches together the atomic subsequence matches to get similar sub-sequences or similar sequences. Each sequence is broken into atomic subsequences by using a sliding window of size w. The atomic subsequences are then mapped to points in a w-dimensional space. The problem of finding similar atomic subsequences now corresponds to the problem of finding pairs of w-dimensional points within distance of each other, using the L 1 norm. (See [2] for the rationale behind this approach.)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Algorithm for high-dimensional Similarity Joins

Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the -kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of nd...

متن کامل

Comparing MapReduce-Based k-NN Similarity Joins on Hadoop for High-Dimensional Data

Similarity joins represent a useful operator for data mining, data analysis and data exploration applications. With the exponential growth of data to be analyzed, distributed approaches like MapReduce are required. So far, the state-of-the-art similarity join approaches based on MapReduce mainly focused on the processing of low-dimensional vector data. In this paper, we revisit and investigate ...

متن کامل

High-dimensional Proximity Joins

Many emerging data mining applications require a proximity (similarity) join between points in a high-dimensional domain. We present a new algorithm that utilizes a new data structure, called the -kd tree, for fast spatial proximity joins on high-dimensional points. This data structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal c...

متن کامل

High-Dimensional Similarity Joins

Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the -kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of f...

متن کامل

Fast similarity join for multi-dimensional data

To appear in Information Systems Journal, Elsevier, 2005 The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memor...

متن کامل

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997